-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3598][CH] Support to config the hash algorithm for the ch shuffle hash partitioner #3604
Conversation
…ffle hash partitioner Now the hash algorithm of the ch shuffle hash partitioner is cityHash64, which is different from vanilla spark, when there is one side shuffle of the join fallbacking, the hash id are different between the ch and vanilla spark, so add a configuration to control the hash algorithm for the ch shuffle hash partitioner. Close apache#3598.
Run Gluten Clickhouse CI |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
murmurHash3_32 148025
cityHash64 152058
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
Now the hash algorithm of the ch shuffle hash partitioner is cityHash64, which is different from vanilla spark, when there is one side shuffle of the join fallbacking, the hash id are different between the ch and vanilla spark, so add a configuration to control the hash algorithm for the ch shuffle hash partitioner.
Close #3598.
(Fixes: #3598)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)